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Forssman heterophilic glycolipid antigen has structural similarity to the histo-blood group A antigen, and 
the GBGTl gene encoding the Forssman glycolipid synthetase (FS) is evolutionarily related to the ABO 
gene. The antigen is present in various species, but not in others including humans. We have elucidated the 
molecular genetic basis of the Forssman antigen negativity in humans. In the human GBGTl gene, we 
identified two common inactivating missense mutations (c.688G>A [p.Gly230Ser] and c.887A>G 
[p.Gln296Arg]). The reversion of the two mutations fully restored the glycosyltransf erase activity to 
synthesize the Forssman antigen in vitro. These glycine and glutamine residues are conserved among 
functional GBGTl genes in Forssman-positive species. Furthermore, the glycine and serine residues 
represent those at the corresponding position of the human blood group A and B transferases with GalNAc 
and galactose specificity, respectively, implicating the crucial role the glycine residue may play in the FS 
al,3-GalNAc transferase activity. 

In 191 1, Forssman discovered that rabbits injected with a suspension of kidney tissue from guinea pig or horse, 
but not from cow or rat, produced antibodies that were capable of hemolysing sheep erythrocytes in the 
presence of complement\ The heterophilic antigen on sheep erythrocytes was later named as the Forssman 
antigen for its discoverer. This antigen is present in a variety of species, but not in others^'^. Species were, therefore, 
categorized into Forssman-positive or negative species, depending on the antigen expression. Mouse and dog 
were also classified as Forssman-positive, and human and other anthropoid apes as Forssman-negative. The 
Forssman antigen was also found in species other than mammals. For example, chicken, turtles, and carp express 
the antigen, whereas goose, pigeon, and frog lack the antigen. Although the Forssman antigen was initially 
recognized on sheep erythrocytes, the expression on erythrocytes is rare. In many species its expression is 
restricted to tissues. However, in certain species it is observed in both erythrocytes and tissues (as in chicken) 
and in others only on erythrocytes (as in sheep). Its tissue distribution was studied in detail in mouse. Perivascular 
macrophages/dendritic cells located close to microvessels express this antigen, whereas fibroblasts, endothelial 
cells, pericytes, myocytes, and neural tissue elements do not^. This may explain the reason why anti- Forssman 
antibody per se fails to induce hyperacute rejection in a xenotransplantation model of mouse heart xenografts to 
rati Tissue-specific expression of the Forssman antigen was also observed in other Forssman-positive mam- 
mals^'^. Its expression was shown to undergo changes during embryonic development and cellular differentiation 
as welF. 

The Forssman antigen is a glycolipid with the structure GalNAcal->3GalNAcpl->3Galal->4Galpl- 
>4Glcpi->lCer^. The terminal AT- acetyl- D-galactosamine (GalNAc) residue bound by otl-3 glycosidic linkage 
is the reason for immunological similarity to the blood group A antigen whose immunodominant structure is 
GalNAcal->3(Fucal->2)Gaipi-. As the oligosaccharide structure indicates, the Forssman antigen is synthe- 
sized through a series of glycosylation reactions. The enzyme responsible for the last step of its biosynthesis is 
Forssman glycolipid synthetase (FS: EC 2.4.1.88). This enzyme catalyzes the transfer of GalNAc from UDP- 
GalNAc nucleotide-sugar to the terminal GalNAc residue of globoside (GalNAcpi->3Galotl->4Gaipi- 
>4Glcpi->lCer) by al-3 linkage. The FS activity was demonstrated to be present in tissues from various 
Forssman-positive species^'^^. 

In 1996, the canine GBGTl gene cDNA encoding FS was cloned by the expression cloning method^\ The gene 
exhibited sequence homology to previously cloned bovine and murine GGTAl genes for al,3-galactosyltrans- 
ferases that synthesize the al,3-galactosyl-epitope (Galotl->3Gaipi->4GlcNAc-) and A and B alleles of the 
human ABO gene [MIMl 10300] encoding the blood group A and B transferases that catalyze the final reactions of 
biosynthesis of the histo-blood group A and B antigens, respectively^^"^^. In humans, the GGTAl gene became a 
pseudogene (GGTAIP) and the al,3-galactosyltransferase activity was lost [MIM104175]. Later, the same group 
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also cloned the rat A3GALT2 gene cDNA encoding the isogloboside 
b3 synthetase (iGb3S), another enzyme with al,3-galactosyltransferase 
specificity, which acts on lactosylceramide to form iGb3 (Galotl- 
>3Gal|3l->4Glcpi->lCer), initiating the biosynthesis of the iso- 
globo-series of glycosphingolipids^^. The sequence homology among 
those enzymes illustrated that those genes have evolved from the 
same ancestral gene by duplications followed by divergence^ ^. 
Those enzymes are categorized in the same GT6 family of al,3- 
Gal(NAc) transferases in the CAZy database that houses structur- 
ally-related catalytic and carbohydrate-binding modules of enzymes 
that create, modify, or degrade glycosidic bonds^^. A full-length 
human GBGTl gene cDNA was later cloned, but no indel mutations 
were found [MIM 606074] The GBGTl mRNA expression was 
observed in a variety of human tissues ranging from heart, lung, 
skeletal muscle, kidney, pancreas, spleen, small and large intestines, 
peripheral leukocytes, prostate, ovary, to placenta. Because the 
human GBGTl cDNA construct failed to produce the Forssman 
antigen, a structural defect was suspected. However, the basis of 
the human Forssman negativity was unsolved. Commemorating 
the 100^^ anniversary of the discovery of Forssman antigen, we 
initiated the exploration to decipher the basis of Forssman antigen 
negativity, and we have finally decoded the mystery. 



Results 

Potential molecular bases of the Forssman antigen negativity in 
non-human species. Figure 1 shows the GBGTl genes and their 
encoded proteins from various species. We have primarily exa- 
mined them in silico and deduced the potential mechanisms of 
Forssman negativity in cow, macaque, pig, rat, rabbit, and frog 
species. In some cases we have also performed minor experiments 
of PGR and nucleotide sequencing. Important amino acid sequence 
changes and likely mechanisms are shown in the rightmost 2 
columns. Compared with the FS protein sequences of the Forssman- 
positive mammals, the bovine {Bos taurus) protein lacks 1 1 amino 
acids due to an additional splicing. Furthermore, the second glycine 
position of the consensus FYYGGA sequence, which is conserved in 
all the Forssman-positive species whose sequences have been 
determined, is replaced by arginine (FYYGRA). We think that 
these alterations are the primary cause of Forssman negativity in 
ox/cow because the corresponding sequences of blood group A 
transferase (FYYLGG or FYYAGG), B transferase (FYYMGA), and 
otl,3-galactosyltransferase (FYYHAA) were previously shown to be 
critical for the recognition and binding of the donor nucleotide-sugar 
substrates by specificity and activity analyses of the A/B transferase 
chimeras and amino acid substitution constructs^""^^, and separately 



Species Name 
(Generic Name) 


Forssman 
Antiqen^) 


EnsembI Gene Name 


EnsembI Peptide Name 


A.A. 
No.") 


Important 
A.A.Seq.=) 


Remarks'") 


Homo sapiens (Human) 




ENSG00000148288 


ENSP00000361110 


347 


FYYGGA 


See Figures 2 & 3. Two inactivating missense mutations 


Pan troglodytes (Chimpanzee) 




ENSPTRG00000021510 


ENSPTRP00000036809 


347 


FYYGGA 


See Figure 2. Two inactivating missense mutations 


Gorilla qorilla (Gorilla) 




ENSGGOG00000003448 


ENSGGOP00000003390 


347 


FYYGGA 


See Fiqure 2. Two inactivating missense mutations 


Pongo abelii (Orangutan) 




ENSPPYG00000019724 


ENSPPYP00000022121 


345 


FYYGGA 




Macaca mulatta (Macaque) 




ENSMMUG00000015277 


ENSMMUP00000020033 


343 


FYYGGK 


The sequence Important for sugar binding is not conserved. 


Mus musculus (Mouse) 


+ 


ENSMUSG00000026829 


ENSMUSP00000127071 


347 


FYYGGA 


See Fiqures 2 & 3. 


Rattus norvegicus (Rat) 












A homologous sequence to the mouse GBGT1 gene exists in the corresponding 
chromosomal region between the Mrps2 and Raigds genes. In addition to many 
nucleotide substitutions, there are several indels that make the encoded 
protein non-functional. 


Cawa porcellus (Guinea piq) 




ENSCPOG00000004874 


ENSCPOP00000004388 


347 


FYYGGA 


See Fiqure 2. 


Oryctolagus cuniculus (Rabbit) 












No GBGT1 gene is annotated. The BLAST search did not identify any. The PCR 
using several pairs of degenerate oligonucleotide primers failed to amplify the 
equivalent sequences. Rabbits may lack the homologous sequence possibly 
due to the loss of chromosomal region containing the GBGT1 gene. 


Bos taurus (Cow) 




ENSBTAG00000030319 


ENSBTAP00000040428 


335 


FYYGRA 


The sequence Important for sugar binding is not conserved. Also 1 1 amino 
acids are missing due to an additional splicinq. 


Ovis aries (Sheep) 


+ 






? 


FYYGGA 


No database is currently available for the BLAST search. However, PCR using 
degenerate primers amplified a homologous sequence, which was partially 
sequenced and confirmed to be the GBGT1 gene. 


Sus scrofa (Pig) 










FYYGGA 


No GBGT1 gene is annotated. The BLAST search failed to identify any. However, 
PCR using degenerate primers amplified a homologous sequence, which was 
partially sequenced. Missense mutations are the likely cause of inactivity. 


Equus caballus (Horse) 




ENSECAG00000012442 


ENSECAP00000010230 


347 


FYYGGA 


See Fiqure 2. 


Felis catus (Cat) 


+ 


ENSFCAG00000013356 


ENSFCAP00000012386 


342 


FYYGGA 




Canis familiaris (Dog) 


+ 


ENSCAFG00000019864 


ENSCAFP00000029424 


347 


FYYGGA 


See Figure 2. 


Gallus qallus (Chicken) 


+ 


ENSGALG00000003340 


ENSGALP00000005275 


343 


FYYGGA 




Meleagris gallopavo (Turkey) 


? 


ENSMGAG00000006307 


ENSMGAP00000006329 


344 


FYYGGA 




Taeniopygia guttata 
(Zebra Finch) 


? 


ENSTGUG00000005430 


ENSTGUP00000005586 


344 


FYYGGA 




Xenopus tropicalis (Frog) 












No GBGT1 gene is annotated. The BLAST search did not identify any frog GBGT1 
genes although several ABO genes were found. No GBGT1 gene equivalent is 
present. 


Gadus morhua (Cod) 


? 


ENSGMOG00000012375 


ENSGMOP00000013247 


275 


YYYSAA 


Exons 1-4 missing? 


Gasterosteus aculeatus 
(Stickleback) 


? 


ENSGACG00000000537 
ENSGACG00000014813 
ENSGACG00000016064 


ENSGACP00000000689 
ENSGACP00000019565 
ENSGACP00000021194 


275 
346 
275 


YYYTAA 

FYYCGA 
YYYTAA 


Exons 1-4 missing? 
Exons 1-4 missing? 


Oryzias latipes (Medaka fish) 


? 


ENSORLG00000010140 
ENSORLG00000008616 


ENSORLP00000012717 
ENSORLP00000010810 


358 
220 


YYYTAA 
YYYSSE 


No ATG? Exons 14 correct? 
No ATG? A large deletion 


Dan/o rerio (Zebrafish) 


? 


ENSDARG00000005257 
ENSDARG00000011283 
ENSDARG00000019207 
ENSDARG00000025275 
ENSDARG00000035555 
ENSDARG00000091936 
ENSDARG00000091944 
ENSDARG00000091969 
ENSDARG00000092718 
ENSDARG00000094600 


ENSDARP00000018621 
ENSDARP00000015549 
ENSDARP00000010651 
ENSDARP00000018900 
ENSDARP00000051540 
ENSDARP00000112590 
ENSDARP000001 20433 
ENSDARP000001 22748 
ENSDARP000001 14822 
ENSDARP00000121282 


277 
302 
304 
316 
342 
137 
342 
275 
301 
140 


YYYTAA 
YYYTAA 

YYYCGA 
YYYTAA 

YYYGGA 

YYYGGA 
YYYGGA 
YYYGGA 
YY 


Exons 1-4 missing? 
Exons 1-4 missing? 

No ATG? Gene fragment? 

Exons 1-4 missing? 

Short N-terminal 

No ATG? Gene fragment? 



Figure 1 | GBGTl genes and encoded proteins from Forssman-positive and negative species. The annotated entries of the GBGTl genes and their 
encoded proteins were extracted from the EnsembI database and categorized by species, a) The Forssman positivity and negativity are indicated by ( + ) 
and ( — ), respectively, when it is known for the species, whereas a question mark (?) shows unknown, b) In order to facilitate the identification of partial 
genes and pseudogenes, the numbers of the deduced amino acid residues of the individual proteins are indicated, c) The amino acid residues 
corresponding to the codons between 258 and 263 of the mouse FS, were also extracted, and the residues that are different from the consensus FYYGGA 
sequence are indicated in bold, d) Additional useful information is added in the Remarks. Potential mechanisms of Forssman negativity deduced from the 
in silico analysis are shown in bold. 
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by the determination of three-dimensional structures of those 
transferases^^"^^ The same argument may also be applied to 
macaque (Macaca mulatto) with the FYYGGK sequence. The 
replacement of the alanine by lysine may exert a devastating effect 
on the enzyme activity because the same substitution at the 
corresponding position of blood group B transferase abolished its 
activity^\ In case of pig {Sus scrofa), the genome database is available, 
but no GBGTl genes have been annotated. We performed PGR using 
several pairs of degenerate oligonucleotide primers, which were 
designed to amplify the GBGTl gene sequences from diverse 
species. We could amplify DNA fragments, and the nucleotide 
sequencing confirmed that the sequences were from the porcine 
GBGTl gene. The FYYGGA sequence was conserved, however, 
there were several amino acid substitutions within the sequenced 
region (M.Y., unpubUshed data). We suspect that a few of the 
missense mutations are the probable cause of its Forssman 
negativity although there may be additional and more critical 
mutations in the unsequenced region of the gene. 

Regarding rat {Rattus norvegicus) and rabbit {Oryctolagus cunicu- 
lus) species, we failed to amplify DNA fragments using the same pairs 
of oligonucleotide primers that amplified the pig sequence. The rea- 
son for the failure soon became apparent for the rat case. There was, 
in fact, a nucleotide sequence homologous to the mouse GBGTl gene 
in the rat genome although the BLAST search using the amino acid 



sequence inquiry did not return any homologous sequences. This 
sequence was located on the p 12 band of rat chromosome 3 between 
Mrps2 (mitochondrial ribosomal protein S2) and GNDS_RAT. 
Because the mouse GBGTl gene is mapped on the A3 band of chro- 
mosome 2 between Mrsp2 and Ralgds (ral guanine nucleotide dis- 
sociation stimulator) and because the rat GNDS_RAT is the Ralgds 
equivalent, the identified rat sequence seemed to be the GBGTl gene 
ortholog. However, due to numerous nucleotide sequence differ- 
ences, including several indels, the protein sequence is not conserved 
in rat, which explains its loss of Forssman antigen expression. The 
OryGun2 (GGA_000003625.1) rabbit genome assembly used for the 
BLAST search had a limited coverage of 7 X , and only 82% have been 
anchored to chromosomes. Neither GBGTl nor Mrps2 genes have 
been annotated. BLAST searches were not successful, either. 
Gonsidering the failed PGR attempts, the homologous sequence 
may not exist in rabbit possibly due to chromosomal region loss, 
or even if it exists, like the rat sequence, the sequence may be quite 
dissimilar to functional GBGTl gene sequences. For frog (Xenopus 
tropicalis), the BLAST search identified several ABO genes but no 
GBGTl genes, suggesting the absence of the GBGTl gene equivalent. 

Identification of the missense mutations responsible for the 
human Forssman antigen negativity. For the loss of Forssman an- 
tigen expression in humans we assumed that missense mutations 



M. musculus 
C. porcellus 
E. caballus 
C. familiaris 

H. sapiens 
P. troglodytes 
G. gorilla 



M. musculus 
C. porcellus 
E. caballus 
C. familiaris 

H. sapiens 
P. troglodytes 

G. gorilla 

M. musculus 
C. porcellus 
E. caballus 
C. familiaris 

H. sapiens 

P. troglodytes 

G. gorilla 

M. musculus 
C. porcellus 
E. caballus 
C. familiaris 

H. sapiens 

P. troglodytes 
G. gorilla 



MTRPRLAQGLAFFLLGGTGLWVLWKFIKDWLLVSYIPYYLPCPE 
MSRRRLVLSLGLIVLASTGLWALWVYIENWLPVSHVPYYLPCPE 
MRRHRLAIGLGVCLLVGTVLCTLWVYVEDWLPVSYVPYYLPCPE 
MRCRRLALGLGFSLLSGIALWSLWIYMETWLPFSYVPYYLPCPE 

+ + + + 

MHRRRLALGLGFCLLAGTSLSVLWVYLENWLPVSYVPYYLPCPE 
MHRRRLALGLGFCLLAGTSLSVLWVYLENWLPVSHVPYYLPCPE 
MHRRRLALGLGFCLLAGTSLSVLWVYLENWLPVSYVPYYLPCPE 



FFNMKLPFRKEKPLQPVTQLQYPQPKLLEHGPTELLTLTPWLAPIV 90 

I FNMKLQYKGEKPLPPVTQAQYPQPRLTQHRPTELLTLTPWLAPI V 9 0 

IFNMKLQYKGEKLFQPVAQSRYPQPKLLEQRPTELLTLTPWLAPIV 90 

IFNMKLQYKGEKPFQPVTRSPHPQPKLLEQRPTELLTLTPWLAPIV 90 

+ + ++ + 

I FNMKLHYKREKPLQPVVWSQYPQPKLLEHRPTQLLTLTPWLAPI V 9 0 

I FNMKLHYKREKPLQPMVWSQYPQPKLLEHRPTQLLTLTPWLAPI V 9 0 

IFNMKLHYKREKPLQPVVWSQYPQPKLLEHRPTQLLTLTPWLAPIV 90 



Kpn l 



SEGTFDPELLKSMYQPLNLTIGVTVFAVGKYTCFIQRFLESAEEFFMRGYQVHYYLFTHDPTAVPRVPLGPGRLLSII 
SEGTFDPELLQHIYQPLNLSIGVTVFAVGKYTRFVQHFLESAEEFFMRGFQVHYYVFTHNPAAIPRVLLGPRRLLDII 
SEGTFNAELLQHIYQPLNLTIGLTVFAVGKYTHFVQHFLESAELFFMHGYRVCYYVFTDDPTAIPQVPLGPGRRLGII 
SEGTFNPELLQHIYQPLNLTIGLTVFAVGKYTRFVQHFLESAEQFFMQGYQVYYYIFTNDPAAIPRVPLGPGRLLSII 

+ + + + 

SEGTFNPELLQHIYQPLNLTIGVTVFAVGKYTHFIQSFLESAEEFFMRGYRVHYYIFTDNPAAVPGVPLGPHRLLSSI 
SEGTFNPELLQHIYQPLNLTIGVTVFAVGKYTHFIQSFLESAEEFFMRGYRVHYYIFTDNPAAIPGIPLGPHRLLSSI 
SEGTFNPELLQHIYQPLNLTTGVTVFAVGKYTHFIQSFLESAEEFFMRGYRVHYYIFTDNPAAVPGVPLGPHRLLSSI 



PIQGYSRWEEIS 180 
PIHGYTHWEEIS 180 
PIQRHSRWEEIS 180 
PIQRHSRWEEIS 180 
+ 

PIQGHSHWEETS 180 
PIQGHSHWEETS 180 
PIQGHSHWEETS 180 



Ncol BsiWI 

MRRMETINKHIAKRAHKEVDYLFCVDVDMVFRNPWGPETLGDLVAAIHPGYFAVPRRKFPYERRQVSSAFVADNEGDFYYGGALFGGRVA 2 7 0 

MRRMEAISRHIAKKAHQEVDYLFCLDVDMVFHNPWGPETLGDLVAAIHPGYFTVSRRQFPYERRQISTAFVAENEGDFYYGGAVFGGRVA 27 0 

TRRMEIISQHIAKRAHREVDYLFCVDVDMVFRNPWGPETLGDLVAAIHPGYYAVPRQQFPYERRHVSTAFVADGEGDFYYGGAVFGGRVA 270 

TRRMETISRHIAQRAHREVDYLFCVDVDMVFRNPWGPETLGDLVAAIHPGYYAVPRQQFPYERRHISTAFVAENEGDFYYGGAVFGGRVA 270 

± + + + 

MRRMETISQHIAKRAHREVDYLFCLDVDMVFRNPWGPETLGDLVAAIHPSYYAVPRQQFPYERRRVSTAFVADSEGDFYYGGAVFGGQVA 27 0 

MRRMETISQHIAKRAHREVDYLFCLDVDMVFRNPWGPETLGDLVAAIHPSYYAVSRQQFPYERRRVSTAFVADSEGDFYYGGAVFGGQVA 270 

MRRMETISQHIAKRAHQEVDYLFCLDVDMVFRNPWGPETLGDLVAAIHPSYYAVPRQQFPYERRRVSTAFVADSEGDFYYGGAVFGGQVA 270 



mui 

RVYEFTRACHMAILADKANSIMAAWQEESHLNRHFIWHKPSKVLSPEYLWDERKPRPRSLKMIRFSSVKKNANWLRT 
NVYEFTRGCHMAILADKANGIMAAWQEESHLNRRLITHKPSKVLSPEYLWDDRKPVPSSLKLIRFSTLLKDTNWLRS 
NVYEFTRGCHMAILADKANGIMAAWQEESHLNRRFISHKPSKVLSPEYLWDDRKRQPPSLKLIRFSTLDKDTSWLRS 

KVYEFTTGCHMAILADKANGIMAAWQEESHLNRRFISHKPSKVLSPEYLWDDRKPQPPSLKLIRFSTLDKATSWLRS 

+ + + + 

RVYEFTRGCHMAILADKANGIMAAWREESHLNRHFISNKPSKVLSPEYLWDDRKPQPPSLKLIRFSTLDKDISCLRS 
RVYEFTRGCHMAILADKANGIMAAWREESHLNCHFISNKPSKVLSPKYLWDDRKPQPPSLKLIRFSTLDKDISCLRS 
RVYEFTRGCHMTILADKANGIMAAWREESHLNRRFISNKPSKVLSPEYLWDDRKPQPPSLKLIRFSTLDKDISCLRS 



347 
347 
347 
347 

347 
347 
347 



Figure 2 | Amino acid sequence comparison of the GBGTl genes between the Forssman-positive and negative species. The species compared are 
mouse, guinea pig, horse, and dog for Forssman-positive and human, chimpanzee, and gorilla for negative, respectively. The splicing junctions are 
indicated with the mark (^). The amino acid residues that are different between the Forssman-positive and negative species are shown with the ( + ) mark. 
The marks and the amino acid residues are in bold when the amino acids are conserved in the 4 Forssman positive species. Additionally, they are 
underlined when conserved in cat and chicken (not shown). The restriction enzyme cleavage sites for the chimera constructions are also indicated. 
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might be responsible. To identify them, we took two different 
approaches: Sequence anafysis of the GBGTl genes in the 
Forssman-positive and negative species and functional assays of FS 
chimeras between Forssman-positive mouse and Forssman-negative 
human and also of the in vitro mutagenized amino acid substitution 
constructs. We first aligned the amino acid sequences of the GBGTl 
genes from 4 Forssman-positive mammals and 3 Forssman-negative 
anthropoid apes, including humans (Figure 2). The locations of the 
splicing junctions are marked with the symbol (^), and the positions 
of the restriction enzyme cleavage sites used for the later construction 
of mouse-human FS chimeras are also indicated. The amino acid resi- 
dues that are different between the Forssman-positive and negative 
species are shown with the ( + ) mark. When the amino acids are 
conserved in the 4 Forssman positive species, the marks, as well as 
the amino acid residues, are shown in bold. Additionally, when they 
are also conserved in Forssman-positive cat {Felis catus) and chicken 
(Gallus gallus) (not shown), the marks and the amino acids are also 
underlined (±). There are 3 such positions. The Forssman-negative 
human, chimpanzee, and gorilla share the c.536T>C [p.Ilel79Thr], 
C.6880A [p.Gly230Ser], and c.887A>G [p.Gln296Arg] substitutions. 

The sequence homology between functional mouse and non-func- 
tional human FSs is shown in Figure 3. Both the mouse and human 
GBGTl genes encode a protein of 347 amino acid residues, of which 
275 are identical (indicated with asterisk) and 72 are different. The 
known human SNP variations resulting in changes in the amino acid 
sequence are also shown. In the dbSNP-135 database, there is a 
nonsense codon termination SNP named rs35898523 (c.363C>A 
[p.Tyrl21Ter]) with a minor allele frequency (MAF) of 0.048, which 
is marked with the symbol ( = ) in the figure. There are also 38 non- 
synonymous missense SNPs, among which one position is consis- 
tently different between the Forssman-positive and negative species. 
It is the SNP rsl41041392 (c.886C>T [p.Arg296Trp]), which 
changes the arginine to tryptophan, but not to the glutamine of the 
mouse and other Forssman-positive species. This SNP is very rare as 
its MAF is listed as 0.00 (1/4,552) in the NHLBI-ESP: ESP Cohort 
Population (ss342287410). 

We next constructed a variety of mouse-human FS chimeras, 
transfected DNA into Forssman-negative, but globoside-positive, 
African green monkey kidney fibroblast COSl cells, and examined 
the Forssman antigen expression. Those cells transformed with an 



origin- defective mutant of simian virus SV40 were previously used to 
clone the canine GBGTl gene cDNA encoding the functional FS^Mn 
Panel A of Figure 4, the proteins encoded by the mouse and human 
GBGTl genes are schematically shown in dark and light grey, 
respectively. The locations of restriction enzyme cleavage sites used 
for chimera constructions, and the positions of important conserved 
amino acid substitutions are indicated. The same dark/light grey code 
was also used to indicate the origins of sequences in the chimeras and 
amino acid substitution constructs. The results of immunocytochem- 
istry are shown side-by-side with the chimeric constructs in Panel B. 
The results of immunostaining with anti-Forssman antibody were 
adjusted by transfection efficiency using co- transfected GFP-positive 
cell percentages. The values are shown in percentage, using the results 
of transfection with the original mouse FS construct and a negative 
control to be 100 and 0, respectively. The original human GBGTl 
gene construct, as well as several others, did not produce any Forss- 
man antigen-positive cells (0%), whereas a decrease was observed 
with several other constructs. The presence of inactivating muta- 
tion(s) was evident within the NcoI-BsiWI region. Additional 
weakening mutations were suggested in the Kpnl-Ncol and BsiWI- 
C-terminal regions. 

We then introduced the Ilel79Thr (I179T), Gly230Ser (G230S), or 
Gln296Arg (Q296R) substitutions individually into the mouse FS 
expression construct. We also introduced the Thrl79Ile (T179I), 
Ser230Gly (S230G), Arg296Gln (R296Q), Asn308His (N308H), 
and Cys344Trp (C344W) substitution individually into the human 
GBGTl gene expression construct. The results of Forssman antigen 
expression after DNA transfection of the amino acid substitution 
constructs are shown in Panel C. Compared to the original mouse 
FS, the construct containing I179T lost some, but not much, FS 
activity. However, the Q296R substitution significantly decreased 
and the G230S substitution almost abolished the activity. None of 
the human constructs exhibited strong Forssman antigen expression. 
However, some cells, many dead or dying, were stained positive after 
transfection with either the S230G or R296Q substitution construct. 
In order to examine the effects of those two substitutions further, we 
prepared the double substitution constructs (the mouse G230S & 
Q296R construct and human S230G & R296Q construct). The results 
are also shown in the same panel. The introduction of both the G230S 
and Q296R substitutions into the mouse FS completely eliminated 



* * *** ** * ** ** * *** ** *** ******** ***** ******* ********** ** ************ 

M._muSCUluS MTRPRLAQGLAFFLLGGTGLWVLWKFIKDWLLVSYIPYYLPCPEFFNMKLPFRKEKPLQPVTQLQYPQPKLLEHGPTELLTLTPWLAPIV 90 
H._sapiens MHRRRLALGLGFCLLAGTSLSVLWVYLENWLPVSYVPYYLPCPEIFNMKLHYKREKPLQPVVWSQYPQPKLLEHRPTQLLTLTPWLAPIV 90 

Human SNP s fg sicapvi 

A 

***** **** ******************* *** ************* **** ** * *** ***** **** ***** * *** * 
M._musculus SEGTFDPELLKSMYQPLNLTIGVTVFAVGKYTCFIQRFLESAEEFFMRGYQVHYYLFTHDPTAVPRVPLGPGRLLSIIPIQGYSRWEEIS 180 
H._SapienS SEGTFNPELLQHIYQPLNLTIGVTVFAVGKYTHFIQSFLESAEEFFMRGYRVHYYIFTDNPAAVPGVPLGPHRLLSSIPIQGHSHWEETS 180 

Human SNP T M - K L w g 

Q 

******* ******* ******* ************************ * **** ****** ** ***** ********* *** ** 
M._mUSCUlUS MRRMETINKHIAKRAHKEVDYLFCVDVDMVFRNPWGPETLGDLVAAIHPGYFAVPRRKFPYERRQVSSAFVADNEGDFYYGGALFGGRVA 27 0 
H._sapiens MRRMETISQHIAKRAHREVDYLFCLDVDMVFRNPWGPETLGDLVAAIHPSYYAVPRQQFPYERRRVSTAFVADSEGDFYYGGAVFGGQVA 27 0 

Human SNP w tqqn m hplchi d 



******* *********** ***** ********** ************* *** * *** **** * ** 
M._musculus RVYEFTRACHMAILADKANSIMAAWQEESHLNRHFIWHKPSKVLSPEYLWDERKPRPRSLKMIRFSSVKKNANWLRT 34 7 

H._sapiens RVYEFTRGCHMAILADKANGIMAAWREESHLNRHFISNKPSKVLSPEYLWDDRKPQPPSLKLIRFSTLDKDISCLRS 34 7 

Human SNP g fw qknrc 

Figure 3 | Amino add sequence homology between the murine and human GBGTl -encoded proteins and the human SNP variations. The amino acid 
residues identical in mice and humans are indicated with asterisk ( * ) . The single SNP variation resulting in codon termination is marked with ( = ) , and 38 
SNPs resulting in amino acid substitutions are shown. 



SCIENTIFIC REPORT: | 2 : 975 | DOI: 1 0.1 038/srep00975 



4 



Mouse Forssm^n glycolipid synthetase 

Kpnl Ncol (BsiWI) (Mlul) 



1 


139/141 1 


214/215 '240/242 


276/277 




1 347 




lie 


Cly 


CI 


In His 


Trp 


Humsin protein 


179 


230 


296 308 


344 


Thr 


Ser 

1 


Arg Aen Cys 


1 


Kpnl 


(Ncol) (BsiWI) 


(Mlul) 




347 














1 

Kpnl Ncol (BslWI) (Mlul) 




1 G Q H W 


10 0 


2 

100 


3 

100 


Kpnl (Ncol) (BsiWI) (Mlul) 




T S R N C 












T 

1 


0 


0 


0 






S 


50 
3 


4 6 
3 


TSTD 
6 






R 


5 


8 


8 






S R 














0 


0 


0 






G 
1 


0 


0 


ND 






Q 


2 


2 


5 






H 


2 


4 


6 






W 


O 


O 


ND 






G Q 


O 


O 




Forssman antigen expression after DNA transfection. (A) Schematic representation of the mouse and human GBGTl gene products. The 



mouse and human sequences are schematically shown in dark and light grey, respectively. The restriction enzyme cleavage sites used for the chimera 
constructions are indicated with and without parentheses for newly created artificial and pre-existed sites, respectively. The amino acid substitutions 
made are also shown. (B) The mouse-human FS chimeras and their FS activity. The constructs and the results of immunostaining with anti-Forssman 
antibody are shown. The values are shown in percentage with the expression of the original mouse FS construct and that of a negative control to be 100 and 
0, respectively. The experiments were repeated with some constructs. The "ND" stands for "Not determined". (C) The in vitro mutagenized amino acid 
substitution constructs and their FS activity. The results of the amino acid substitution constructs are shown. 



Forssman antigen expression. On the other hand, the introduction of 
both the S230G and R296Q substitutions enabled the human GBGTl 
protein to synthesize Forssman glycolipid at the equivalent level of 
mouse FS. The pictures of cells transfected with these constructs are 
shown in Figure 5, as well as those transfected with either the original 
mouse construct, the original human construct, or the human R296Q 
construct (to be discussed later). The pictures at left and right rep- 
resent the expression of co-transfected GFP protein and Forssman 
glycolipid, respectively. 

These results also showed that none of the remaining 70 amino 
acid substitutions in the human GBGTl gene were responsible for 
Forssman negativity. The codon 230 corresponds to the codon 235 
of the human A/B transferases, and the glycine and serine residues 
are present in A and B transferases with GalNAc and galactose 



specificity, respectively^ Therefore, we concluded that the glycine 
residue is crucial for the al,3-GalNAc transferase activity of FS. The 
fact that only 2 amino acid changes reinstated FS activity may suggest 
that the inactivation of the GBGTl genes in anthropoid apes was 
recent in evolution. 

Discussion 

The biosynthesis of Forssman glycolipid is complex and requires a 
multi-step glycosylation. Deficiency at any step may result in the 
failure to synthesize this antigen. For instance, non-functional alleles 
are known to exist for the B3GALNT1 gene encoding pl,3-Ar-acetyl- 
D-galactosaminyltransferase 1, which catalyzes the biosynthesis of 
globoside, the acceptor substrate of FS. Those individuals who are 
homozygous of the non-functional alleles cannot produce globoside 
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Figure 5 | GFP and Forssman glycolipid antigen expression of COSl cells 
transfected with key constructs. The left and right panels show the 
pictures of GFP and Forssman antigen expression, respectively. 

and exhibit the phenotype of the P blood group system^^'^^. 
Therefore, the inactivating mutations of the B3GALNT1 gene can 
also be regarded as a cause of Forssman glycolipid antigen negativity. 
However, the combined frequency of such non- functional alleles is 
extremely low (<0.001). As shown in Figure 3, a nonsense mutation 
(Tyrl21Ter) is known to exist for the GBGTl gene as an SNP 
(rs35898523). With a MAF of 0.048, this mutation causing early 
termination cannot be ignored. However, its frequency is also too 
low to explain the human Forssman antigen negativity because the 
chance of having non-functional recessive alleles in a homozygous 
manner is small. In addition to the nonsense SNP, there are addi- 
tional 38 non- synonymous SNPs, 14 synonymous SNPs, and 3 and 
16 SNPs in the 5' upstream and 3' downstream, untranslated regions 
of mRNA, respectively, of the human GBGTl gene. It is possible that 
some of these SNPs may also affect the functionality of human FS. 
However, the two inactivating missense mutations we identified are 
very common and present in almost 100% of individuals, which 
makes these the most likely and sufficient causes of Forssman antigen 
negativity in humans. 

Categorizing species into Forssman-positive and negative is solid 
because Forssman's results were confirmed and expanded by others. 
Our results presented in this paper also support this classification. 
However, this categorization is not absolute. Like ABO, intra-species 
polymorphism of the Forssman antigen expression exists. An 
example has recently been reported of the individuals with rare 
ABO subgroup Apae^^. Those individuals possess the GBGTl gene 
with the C.8870A [p.Arg296Gln] substitution and weakly express 
Forssman antigen on their erythrocytes. Because the frequency of the 
Apae subgroup is exceptionally low, no cases are deposited in the SNP 
database. In our DNA transfection experiments the human R296Q 
construct having the same mutation resulted in the appearance of 
some cells with Forssman antibody reactivity (Figures 4C and 5), 
corroborating the presumption that the germline mutation confers 
weak FS activity to the GBGTl protein with the substitution. Our 
results have also proven that the mutant allele is dominant over the 



regular allele in its functionality. The Forssman antigen on erythro- 
cytes in those Apae individuals may be produced by those cells. 
Alternatively, like the Lewis antigens the glycolipid produced in 
other tissues may be adsorbed onto their erythrocytes. 

In addition of intra-species polymorphism, somatic changes may 
also enable human cells to produce Forssman glycolipid. The 
expression of Forssman antigen has been reported of cancer cells 
and tissues, including metastatic tumor of biliary adenocarcinoma 
in liver^^, gastric and colonic mucosa and tumors of these tissues^^"^^, 
lung and lung carcinoma^^, and MRK-nu-1 breast carcinoma cell 
line, PCIO lung carcinoma cell line, and KATOIII, MKN45, 
MKN74 gastric carcinoma cell lines^^. Questionable specificity of 
the antibodies used in the immunological detection of Forssman 
antigen may be responsible in some studies. However, Forssman 
glycolipid is expressed undoubtedly in certain human cells and tis- 
sues because its structure was chemically determined^^. Together 
with the fact that R296Q polymorphism confers Forssman antigen 
positivity in the Apae subgroup individuals, the results of our in vitro 
mutagenesis study suggest that either this or S230G substitution 
alone may grant the human GBGTl gene protein with weak FS 
activity. However, no such somatic mutations have been reported 
of the gene in the Catalogue of Somatic Mutations in Cancer - 
COSMIC database. Therefore, other mechanisms maybe responsible 
if changes in the GBGTl gene cause the appearance of Forssman 
antigen. The potential mechanisms may include over expression of 
the GBGTl gene mRNA, splicing variations, increased stability of 
mRNA and protein, and post-translational protein modifications. 
Additionally, alterations in specificity and activity of other glycosyl- 
transferases, such as blood group A transferase, may allow aberrant 
glycosylation to synthesize Forssman glycolipid under certain con- 
ditions. Similarly, vestigial human FS may be activated to synthesize 
Forssman antigen if extraordinarily higher concentrations of UDP- 
GalNAc and globoside substrates than threshold levels become avail- 
able in the Golgi apparatus. This argument may be supported by 
previous findings that certain tumor tissues exhibit higher concen- 
trations of UDP-GalNAc and UDP-GlcNAc^^ and that an increased 
level of intracellular UDP-GlcNAc activates 0-GlcNAc transferase 
(OGT) and leads to enhanced 0-GlcNAcylation of target proteins^^. 
The Forssman antigen expression also seems to be influenced by cell 
density because more Forssman-positive cells appeared after DNA 
transfection on the dish edges (top of each picture in Figure 5), where 
cells are more crowded. Now that the basis of the Forssman antigen 
negativity in humans is clarified, the elucidation of molecular 
mechanisms leading to Forssman antigen expression in Forssman- 
negative human species may not be too far. 

Methods 

Sequence analysis. The nucleotide and deduced amino acid sequences of the 
annotated GBGTl genes from various species were extracted from the Ensembl 
database. Additional homologous sequences were BLAST searched among the 
sequences deposited in the Ensembl and GenBank databases using the nucleotide and 
deduced amino acid sequences of known GBGTl genes in the evolutionarily related 
species as query sequences. We used the 5.05 version of MEGA5 software for 
sequence alignment (http://www.megasoftware.net/)^^. 

Preparations of mouse and human GBGTl gene expression constructs, their 
chimeric constructs, and in vitro mutagenized amino acid substitution constructs. 

The plasmids containing the human GBGTl cDNA and the mouse GBGTl cDNA 
were purchased from the Open Biosystem (Lafayette, CO). Compared with the 
reference sequence {GBGTl-OOh ENST00000372040), the human GBGTl cDNA 
sequence contained 2 SNPs: T residues were found in place of Cs at the nucleotides 58 
and 987 of the coding sequence (rs2073924 and rs 17853056), respectively. The former 
resulted in the Leu20Phe substitution although the latter did not change the amino 
acid sequence. After PCR amplification using forward and reverse oligonucleotide 
primers that were tagged with the EcoRI and BamHI restriction enzyme cleavage sites, 
respectively, the coding regions of the GBGTl cDNAs from the two species were 
cleaved with those enzymes and ligated with the EcoRI-BamHI fragment of the 
eukaryotic expression vector pSG5 (Stratagene, CA). The unique cleavage site for the 
restriction enzyme Kpnl pre-existed at equivalent locations of both human and 
mouse cDNAs whereas the Ncol site was only present in the mouse sequence. We 
created an Ncol site at the corresponding position of the human gene without 
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modifying the amino acid sequence, by oligonucleotide primer-mediated in vitro 
mutagenesis through two rounds of PGR reactions as previously described^\ 
Similarly, the cleavage sites for BsiWI and Mlul enzymes were also crafted at the 
corresponding sequences of the human and mouse genes without changes in their 
amino acid sequences. Next, a variety of chimeras possessing different portions of 
functional mouse FS and non-functional human FS were produced, by shuffling the 
equivalent restriction fragments between those two species. Later, the amino acid 
substitution constructs were prepared by the oligonucleotide primer- mediated in 
vitro mutagenesis. The nucleotide sequence of the GBGTl gene was determined for all 
of the constructs. Only those without any unintended non- synonymous mutations 
were used for the subsequent DNA transfection experiments. 

DNA transfection and immunostaining. DNA from the GBGTl gene expression 
constructs and their derivatives was used for transfection. Forssman-negative but 
globoside-positive COSl cells from African green monkey kidney were used as a 
recipient. These cells were previously used to clone the canine GBGTl gene cDNA 
encoding the functional FS, together with the MDCK II dog kidney cell cDNA 
expression library^ \ The Lipofect AMINE 2000 reagent (Life Technologies, Carlsbad, 
CA) was used to mediate DNA transfection. The cells were plated in a 96-well plate, 
and DNA transfection experiments were performed following the protocol provided 
by the manufacturer. In order to determine the efficiency of DNA transfection and the 
capacity of the transfected cells to express the transgenes, an equal amount of the 
pEGFP plasmid that expresses a red-shifted variant of GFP was co-transfected. Three 
days after transfection, the cells were fixed with 4% paraformaldehyde, washed with 
PBS, and dried. Immunostaining was later performed to detect the expression of 
Forssman antigen using a rat monoclonal antibody against murine Forssman 
glycolipid antigen (Clone FOM-1 from BMA Biomedicals, Switzerland). Biotin-SP- 
conjugated AffinitPure Goat AntiRat IgG + IgM(H + L) secondary antibody from 
Jackson ImmunoResearch Laboratories (West Grove, PA) and Vectastain ABC kit 
and DAB Peroxidase Substrate kit from Vector Laboratories (Burlingame, CA) were 
used. GFP-positive cells were counted under a fluorescence microscope. Forssman 
antigen-expressing cells were also counted visually after immunostaining. The ratios 
of the cells positive with Forssman antigen versus the cells positive with GFP were 
calculated, and they were adjusted so that the values become 100 for the cells 
transfected with the original mouse GBGTl gene expression construct. For several 
crucial constructs, DNA transfection experiments were repeated using independent 
preparations of DNA from different clones, in order to exclude the possibility of 
unintended mutation(s) in the vector sequence. Cell pictures were taken at SOX 
magnification, using a Leica DMIL Fluorescence Microscope and a Leica D-Lux3 
digital camera, before (for GFP expression) and after (for Forssman antigen 
expression) the treatment with the DAB staining reagent for color development. 
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